In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
In [2]:
jan_core = pd.read_csv("jan_18_core.csv")
feb_core = pd.read_csv("feb_18_core.csv")
mar_core = pd.read_csv("mar_18_core.csv")
apr_core = pd.read_csv("apr_18_core.csv")
may_core = pd.read_csv("may_18_core.csv")
jun_core = pd.read_csv("jun_18_core.csv")

[CRQ2] Visualize Taxis movements! With the help of folium library and a JSON file containing all the coordinates for NYC zones, we showed a choroplet map both for taxi rides pick up locations and drop off locations.

In [3]:
frames = [jan_core, feb_core, mar_core, apr_core, may_core, jun_core]
In [4]:
#we create a new empty dataframe, and then we add all the LocationIDs 
viz = pd.DataFrame() 
for df in frames:
    viz = viz.append(df[["PULocationID","DOLocationID"]])
In [5]:
#we create a column 'counts' with the number of rides in each Location
vizPU = viz.groupby(['PULocationID']).size().reset_index(name='counts')

#appending missing values to the Pick Up df, because they were not in out data since there are not rides in those zones
vizPU = vizPU.append({'PULocationID' : 103 , 'counts' : 0} , ignore_index=True)
vizPU = vizPU.append({'PULocationID' : 104 , 'counts' : 0} , ignore_index=True)
vizPU = vizPU.append({'PULocationID' : 99 , 'counts' : 0} , ignore_index=True)
In [6]:
import json
import folium
import os
In [7]:
#loading the json file with the coordinates of nyc zones
geo_json = json.load(open('taxi_zones.json'))
In [19]:
#PICK UP LOCATION MAP

m = folium.Map(location=[40.7, -74], zoom_start = 10.5)
folium.GeoJson(geo_json).add_to(m) #adding the coordinates given by the json file

#creating the choroplet map and setting parameters
m.choropleth(
    geo_data= geo_json,
    name= 'choropleth',
    data = vizPU,
    columns = ['PULocationID', 'counts'],
    key_on = 'feature.properties.LocationID',
    fill_color = 'BuPu',
    fill_opacity = 0.5,
    line_opacity = 0.3,
    legend_name = "Taxi pick up locations map"
 
)

We can see through this choroplet map that the zones more frequented by yellow taxis are in the heart of Manhattan, specially in the adjacency of Central Park : Upper East Side (Park Avenue), Midtown and in proximity of Penn Station (main intercity railroad station in NYC). The yellow cab usage is also high in the area under central park - between 3rd and 9th Avenues (Times Square - Diamond District - other points of interest). Easy to say that these locations are really crowded and even highly populated by tourists, all zones in which the flow of people is high.

Regarding the whole map Manhattan is competing in a different league in relation to other boroughs, "owning" the great majority of taxi rides. The lower level of Pick ups in the northern part of Manhattan could be due to the fact that Green Taxis are allowed to start rides there (north of West 110th street and East 96th street) and in all other boroughs, and they can drop the passengers off anywhere, those data obviously are not considered in our analysis. We can also see that the only other darker areas outside Manhattan are two airports: JFK and La Guardia (both in Queens) respectively first and third airports for traffic in New York City. The second - Newark Airport - is served mainly by New Jersey's local companies of taxis, so it does not have a significant spot in our yellow cabs map.

In [20]:
m
Out[20]:
In [10]:
#we create a column 'counts' with the number of rides in each Location
vizDO = viz.groupby(['DOLocationID']).size().reset_index(name='counts')
In [11]:
#appending missing values to the Drop Off df, so the map won't show wrong things
vizDO = vizDO.append({'DOLocationID' : 103 , 'counts' : 0} , ignore_index=True)
vizDO = vizDO.append({'DOLocationID' : 104 , 'counts' : 0} , ignore_index=True)
vizDO = vizDO.append({'DOLocationID' : 110 , 'counts' : 0} , ignore_index=True)
In [21]:
#DROP OFF LOCATION MAP

l = folium.Map(location=[40.7, -74], zoom_start = 10.5)
folium.GeoJson(geo_json).add_to(l) #adding the coordinates given by the json file


#creating the choroplet map and setting parameters
l.choropleth(
    geo_data= geo_json,
    name= 'choropleth',
    data = vizDO,
    columns = ['DOLocationID', 'counts'],
    key_on = 'feature.properties.LocationID',
    fill_color = 'BuPu',
    fill_opacity = 0.5,
    line_opacity = 0.3,
    legend_name = "Taxi drop off locations map"
 
)

Looking at the drop off map we can't say anything different from the previous choroplet (pick ups), this is probably caused by the balanced process of rides starting and ending in the same places; in the sense that there are not significative reasons to have flows just in one way in the most popular zones.

In [22]:
l
Out[22]: